Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overriding retry policy for SMS high priority #2008

Merged
merged 33 commits into from
Nov 6, 2023

Conversation

jimleroyer
Copy link
Member

Summary | Résumé

These changes override the retry policy for SMS high priority.

In case where the Celery workers went down, we want to retry the high priority notifications within a 1 minute time frame to honor its SLO.

Test instructions | Instructions pour tester la modification

No tests at the moment; working on it.

Release Instructions | Instructions pour le déploiement

Test in staging by running performance tests and emulate spikes of notifications bulk that will bring up and down Celery workers.

Reviewer checklist | Liste de vérification du réviseur

This is a suggested checklist of questions reviewers might ask during their
review | Voici une suggestion de liste de vérification comprenant des questions
que les réviseurs pourraient poser pendant leur examen :

  • Is the code maintainable? | Est-ce que le code peut être maintenu?
  • Have you tested it? | L’avez-vous testé?
  • Are there automated tests? | Y a-t-il des tests automatisés?
  • Does this cause automated test coverage to drop? | Est-ce que ça entraîne
    une baisse de la quantité de code couvert par les tests automatisés?
  • Does this break existing functionality? | Est-ce que ça brise une
    fonctionnalité existante?
  • Does this change the privacy policy? | Est-ce que ça entraîne une
    modification de la politique de confidentialité?
  • Does this introduce any security concerns? | Est-ce que ça introduit des
    préoccupations liées à la sécurité?
  • Does this significantly alter performance? | Est-ce que ça modifie de
    façon importante la performance?
  • What is the risk level of using added dependencies? | Quel est le degré de
    risque d’utiliser des dépendances ajoutées?
  • Should any documentation be updated as a result of this? (i.e. README
    setup, etc.) | Faudra-t-il mettre à jour la documentation à la suite de ce
    changement (fichier README, etc.)?

@jimleroyer jimleroyer marked this pull request as draft October 24, 2023 13:26
@jimleroyer jimleroyer marked this pull request as ready for review October 26, 2023 20:53
@sastels
Copy link
Collaborator

sastels commented Oct 26, 2023

This looks good to me. I think that this retry is for when, for example, the boto call fails. If celery itself crashes or is killed outright then I think we have to rely on the SQS retry timeout

(EMAIL_TYPE, PRIORITY, 48, 300),
(SMS_TYPE, BULK, 48, 300),
(SMS_TYPE, NORMAL, 48, 300),
(SMS_TYPE, PRIORITY, 48, 26),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of hard coding these values we could pull them from RETRY_POLICY_DEFAULT and RETRY_POLICY_HIGH

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it to the config values. 👍

Copy link
Collaborator

@sastels sastels left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! :shipit:

Copy link
Collaborator

@sastels sastels left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still all looks good.., Is there an easy way to test locally? I guess the commented out exception you have will do it? ie run and check that it gets retried quickly?

@@ -112,6 +113,7 @@ def _deliver_sms(self, notification_id):
notification = notifications_dao.get_notification_by_id(notification_id)
if not notification:
raise NoResultFound()
# raise Exception("Trigger artificial Celery retry")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥 before release

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah heh

@@ -43,7 +43,7 @@
"-l",
"DEBUG",
"-Q",
"database-tasks,-priority-database-tasks.fifo,-normal-database-tasks,-bulk-database-tasks,job-tasks,notify-internal-tasks,periodic-tasks,priority-tasks,bulk-tasks,reporting-tasks,research-mode-tasks,retry-tasks,send-sms-tasks,send-email-tasks,service-callbacks,delivery-receipts",
"database-tasks,-priority-database-tasks.fifo,-normal-database-tasks,-bulk-database-tasks,job-tasks,notify-internal-tasks,periodic-tasks,priority-tasks,normal-tasks,bulk-tasks,reporting-tasks,research-mode-tasks,retry-tasks,send-sms-tasks,send-sms-high,send-sms-medium,send-sms-low,send-throttled-sms-tasks,send-email-high,send-email-medium,send-email-low,send-email-tasks,service-callbacks,delivery-receipts",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@jimleroyer
Copy link
Member Author

Is there an easy way to test locally? I guess the commented out exception you have will do it? ie run and check that it gets retried quickly?

Yes that is the strategy I use at the moment, i.e. trigger an exception for every send of SMS so that I can test the retries. 👍

@jimleroyer jimleroyer merged commit e461edc into main Nov 6, 2023
4 checks passed
@jimleroyer jimleroyer deleted the feat/decrease-retry-for-sms-high branch November 6, 2023 22:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants